-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix has_null predicate for drop_list_duplicates on nested structs #10436
Fix has_null predicate for drop_list_duplicates on nested structs #10436
Conversation
Signed-off-by: sperlingxx <[email protected]>
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10436 +/- ##
================================================
+ Coverage 86.13% 86.18% +0.04%
================================================
Files 139 139
Lines 22438 22470 +32
================================================
+ Hits 19328 19365 +37
+ Misses 3110 3105 -5
Continue to review full report at Codecov.
|
{ | ||
using ColWrapper = cudf::test::fixed_width_column_wrapper<TypeParam, int32_t>; | ||
auto constexpr XXX = int32_t{0}; // nulls at the parent structs column level | ||
auto constexpr YYY = int32_t{0}; // nulls at the parent structs column level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling with the distinction between XXX and YYY here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidwendt Sorry, it is my blunder. They should be assigned with different values to create intra null structs with
children of different values.
Co-authored-by: Nghia Truong <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
@gpucibot rerun tests |
1 similar comment
@gpucibot rerun tests |
@gpucibot rerun tests |
@gpucibot merge |
For now,
get_indices_of_unique_entries_dispatch
only checks the outest layer of structures to determine whether the column contains null or not. It leads to the omission of intra null structures.In current,
[Struct(Struct(1, "a", valid)), Struct(Struct(1, "a", valid), Struct(Struct(2, "a", invalid), Struct(Struct(1, "b", invalid)]
becomes[Struct(Struct(1, "a", valid)), Struct(Struct(2, "a", invalid), Struct(Struct(1, "b", invalid)]
after droping list duplicates even if null equality is EQUAL. However, it should be[Struct(Struct(1, "a", valid)), Struct(Struct(2, "a", invalid)]
, becauseStruct(1, "a")
andStruct(2, "a")
are both invalid, despite they contain different elements.Current PR is to fix above problem by checking all flattened coulumns for nulls.